Decentralized Approximate Bayesian Inference for Distributed Sensor Network

نویسندگان

  • Behnam Gholami
  • Sejong Yoon
  • Vladimir Pavlovic
چکیده

Bayesian models provide a framework for probabilistic modelling of complex datasets. Many such models are computationally demanding, especially in the presence of large datasets. In sensor network applications, statistical (Bayesian) parameter estimation usually relies on decentralized algorithms, in which both data and computation are distributed across the nodes of the network. In this paper we propose a framework for decentralized Bayesian learning using Bregman Alternating Direction Method of Multipliers (B-ADMM). We demonstrate the utility of our framework, with Mean Field Variational Bayes (MFVB) as the primitive for distributed affine structure from motion (SfM). Introduction The traditional setting for many machine learning algorithms is the one where the model (e.g., a classifier or a regressor, typically parametric in some sense) is constructed from a body of data by processing this body in either batch or online fashion. The model itself is centralized and the algorithm has access to all model parameters and all data points. However, in many application scenarios today it is not reasonable to assume access to all data points because they could be distributed over a network of sensors or processing nodes. In those settings collecting and processing data in a centralized fashion is not always feasible because of several critical challenges. First, in applications such as networks of cameras mounted on vehicles, the networks are constrained by severe capacity and energy constraints, considerably limiting the node communications (Radke 2010; Tron and Vidal 2011). Second, in many distributed sensor network applications such as health care, ecological monitoring, or smart homes, collecting all data at a single location may not be feasible because of its sheer volume as well as potential privacy concerns. Lastly, many sensor network tasks need to be performed in real time. Hence, processing data after collecting all information from different nodes prohibits performing the tasks in real time. The size of the centralized data would incur an insurmountable computational burden on the algorithm, preventing real-time or anytime (Zilberstein and Copyright c © 2016, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved. Russell 1993) processing often desired in large sensing systems (Giannakis et al. 2015). Distributed sensor networks provide an application setting in which distributed optimization tasks (including machine learning) that deal with some of the aforementioned challenges are frequently addressed (Radke 2010; Boyd et al. 2010). However, they are traditionally considered in a nonBayesian (often deterministic) fashion. Moreover, the data is often assumed to be complete (not missing) across the network and in individual nodes. As a consequence, these approaches usually obtain parameter point estimates by minimizing a loss function based on the complete data and dividing the computation into subset-specific optimization problems. A more challenging yet critical problem is to provide full posterior distributions for the parameters estimated in the aforementioned distributed settings. Such posteriors have the major advantage of characterizing the uncertainty in parameter learning and predictions, absent from traditional distributed optimization approaches. Another drawback of such approaches is that they traditionally rely on batch processing within individual nodes, unable to seamlessly deal with streaming data frequently present in sensing networks. However, both the sequential inference and the data completion would be naturally handled via the Bayesian analysis (Broderick et al. 2013), if one could obtain full posterior parameter estimates in this distributed setting. We also want our distributed Bayesian framemwork to work not only on discrete variables (Paskin and Guestrin 2004) but also in continuous cases. In a recent work, Yoon and Pavlovic (2012) proposed a new method that estimates parametric probabilistic models with latent variables in a distributed network setting. The performance of this model was demonstrated to be on par with the centralized model, while it could efficiently deal with the distributed missing data. Nevertheless, the approach has several drawbacks. First, its use of the Maximum Likelihood (ML) estimation increases the risk of overfitting, which is particularly pronounced in the distributed setting where each node works with a subset of the full data. Second, the approach cannot provide a measure of uncertainty around the estimated parameters that may be crucial in many applications, e.g., in online learning for streaming data or in assessing confidence of predictions. In this paper we propose a Distributed Mean Field Variational Inference (D-MFVI) algorithm for Bayesian Inference in a large class of graphical models. The goal of our framework is to learn a single consensus Bayesian model by doing local Bayesian inference and in-network information sharing without the need for centralized computation and/or centralized data gathering. In particular, we demonstrate D-MFVI on the Bayesian Principle Component Analysis (BPCA) problem and then apply this model to solve the distributed structure-from-motion task in a camera network. Bregman Alternative Direction Method of Multipliers (B-ADMM) ADMM has been successfully applied in a broad range of machine learning applications (Boyd et al. 2010). ADMM is canonically used for optimizing the following objective function subject to an equality constraint: arg min x,z f(x) + g(z), s.t. Ax+Bz = c, (1) where x ∈ R, z ∈ R , and f and g are convex functions, A,B and c are some fixed terms. ADMM iteratively optimizes the augmented Lagrangian of (1), defined as: Lp(x, z, y) = f(x) + g(z) + 〈y,Ax+Bz − c〉 + η/2‖Ax+Bz − c‖2, (2) where y is the dual variable, η > 0 is a penalty parameter, and the goal of quadratic penalty term is to penalize the violation of the equality constraint. The optimization is typically accomplished in a three-step update: xt+1 = arg min x f(x) + 〈yt, Ax+Bzt − c〉 + η/2‖Ax+Bzt − c‖2, (3) zt+1 = arg min z g(z) + 〈yt, Axt+1 +Bz − c〉 + η/2‖Axt+1 +Bz − c‖2, (4) yt+1 = yt + η(Axt+1 +Bzt+1 − c), (5) Bregman ADMM (B-ADMM) replaces the quadratic penalty term in ADMM by a Bregman divergence (Wang and Banerjee 2014). This generalization of Euclidean metric will become essential when dealing with densities in the exponential family and the D-MFVI. More precisely, the quadratic penalty term in the x and z updates will be replaced by a Bregman divergence in B-ADMM: xt+1 = arg min x f(x) + 〈yt, Ax+Bzt − c〉 + ηBφ(c−Ax,Bzt), (6) zt+1 = arg min z g(z) + 〈yt, Axt+1 +Bz − c〉 + ηBφ(Bz, c−Axt+1), (7) yt+1 = yt + η(Axt+1 +Bzt+1 − c), (8) where Bφ : Θ × Θ → R+ is the Bregman divergence with Bregman function φ (φ is a strictly convex function on a closed convex set Θ) that is defined as: Bφ(x, y) = φ(x)− φ(y)− 〈∇φ(y), x− y〉, (9) where∇ denotes the gradient operator. Figure 1: A graphical representation of the model of Eq. 10. Blue-shaded circle denotes observation. Distributed Mean Field Variational Inference (D-MFVI) We first explain a general parametric Bayesian model in a centralized setting. Then, we derive its distributed form. Centralized Setting Consider a data set X of observed D-dimensional vectors X = {xi ∈ R}i=1 with the corresponding local latent variables Z = {zi ∈ R}i=1, a global latent variable W ∈ R and a set of fixed parameters Ω = [Ωz,Ωw]. The main assumption of our class of models is the factorization of the joint distribution of the observations, the global and the local variables into a global term and a product of local terms: P (X,Z,W |Ω) = P (W |Ωw) N ∏ i=1 P (xn|zn,W )P (zn|Ωz). (10) The graphical representation of this class of models is shown in Fig 1. Given the observations, the goal is to compute (the approximation of) the posterior distribution of the latent variables, P (W,Z|X,Ω). For ease of computation, we use an exponential family assumption of the conditional distribution of a latent variable given the observation and the other latent variables: P (W |X,Z,Ωw) =h(W ) exp { ψw(X,Z,Ωw) >T (W ) −Aw ( ψw(X,Z,Ωw) )} , (11)

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Efficient Approximate Inference in Distributed Bayesian Networks for MAS-based Sensor Interpretation (Short Paper)

The multiply sectioned Bayesian network (MSBN) framework is the most studied approach for distributed Bayesian Network inference in an MAS setting. This paper describes a new framework that supports efficient approximate MASbased sensor interpretation, more autonomy and asynchrony among the agents, and more focused, situation-specific communication patterns. Its use can lead to significant impr...

متن کامل

Approximate Decentralized Bayesian Inference

This paper presents an approximate method for performing Bayesian inference in models with conditional independence over a decentralized network of learning agents. The method first employs variational inference on each individual learning agent to generate a local approximate posterior, the agents transmit their local posteriors to other agents in the network, and finally each agent combines i...

متن کامل

Efficient approximate inference in distributed Bayesian networks for MAS-based sensor interpretation

The multiply sectioned Bayesian network (MSBN) framework is the most studied approach for distributed Bayesian Network inference in an MAS setting. This paper describes a new framework that supports efficient approximate MASbased sensor interpretation, more autonomy and asynchrony among the agents, and more focused, situation-specific communication patterns. Its use can lead to significant impr...

متن کامل

Monte Carlo Sensor Networks

Biswas et al. [1] introduced a probabilistic approach to inference with limited information in sensor networks. They represented the sensor network as a Bayesian network and performed approximate inference using Markov Chain Monte Carlo (MCMC). The goal is to robustly answer queries even under noisy or partial information scenarios. We propose an alternative method based on simple Monte Carlo e...

متن کامل

A New Framework for Inference in Distributed Bayesian Networks for Multi-Agent Sensor Interpretation

Multi-agent systems (MAS) are groups of interacting intelligent software agents. An important application is sensor interpretation (SI) in sensor networks. SI domains are frequently modeled with Bayesian networks (BNs), and distributed versions of these problems can be modeled with distributed Bayesian networks (DBNs). The multiply sectioned Bayesian network (MSBN) framework is the most studied...

متن کامل

Statistical Methods for Cooperative and Distributed Inference in Wireless Networks

This thesis is concerned with application of statistical methods – namely, random matrix theory (RMT) and belief propagation (BP) – in distributed inference problems in wireless communication networks. The term “distributed inference” denotes, in general, detection/estimation involving multiple network nodes (“sensors”) that collect physical measurements and communicate with each other. Such pr...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016